AITopics | claude 3

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
Asia > Japan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Law (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Neural Information Processing SystemsFeb-18-2026, 13:04:58 GMT

InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models

With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code LLMs, which span beyond code generation to answering diverse coding-related questions.

benchmark, large language model, machine learning, (19 more...)

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-18-2026, 06:00:58 GMT

d0718553fd6b227a353c6432cf893285-Paper-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, programming language, (23 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia (0.04)
North America > Montserrat (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Government (0.67)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Neural Information Processing SystemsFeb-18-2026, 05:20:49 GMT

cdf6f8e9fd9aeaf79b6024caec24f15b-Paper-Datasets_and_Benchmarks_Track.pdf

To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.

large language model, machine learning, natural language, (20 more...)

Country:

Europe > Italy (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Banking & Finance (0.92)
Government (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(5 more...)

Neural Information Processing SystemsFeb-15-2026, 20:28:27 GMT

The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators

Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models.

large language model, machine learning, natural language, (18 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Neural Information Processing SystemsDec-27-2025, 14:37:41 GMT

VHELM: A Holistic Evaluation of Vision Language Models

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects:,,,,,,,, and . In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors.

artificial intelligence, natural language, proceedings, (9 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.07)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.90)

Guilbert, Salomé, Masschelein, Cassandra, Goumaz, Jeremy, Naida, Bohdan, Schwaller, Philippe

DynaMate: An Autonomous Agent for Protein-Ligand Molecular Dynamics Simulations

arXiv.org Artificial IntelligenceDec-12-2025

Force field-based molecular dynamics (MD) simulations are indispensable for probing the structure, dynamics, and functions of biomolecular systems, including proteins and protein-ligand complexes. Despite their broad utility in drug discovery and protein engineering, the technical complexity of MD setup, encompassing parameterization, input preparation, and software configuration, remains a major barrier for widespread and efficient usage. Agentic LLMs have demonstrated their capacity to autonomously execute multi-step scientific processes, and to date, they have not successfully been used to automate protein-ligand MD workflows. Here, we present DynaMate, a modular multi-agent framework that autonomously designs and executes complete MD workflows for both protein and protein-ligand systems, and offers free energy binding affinity calculations with the MM/PB(GB)SA method. The framework integrates dynamic tool use, web search, PaperQA, and a self-correcting behavior. DynaMate comprises three specialized modules, interacting to plan the experiment, perform the simulation, and analyze the results. We evaluated its performance across twelve benchmark systems of varying complexity, assessing success rate, efficiency, and adaptability. DynaMate reliably performed full MD simulations, corrected runtime errors through iterative reasoning, and produced meaningful analyses of protein-ligand interactions. This automated framework paves the way toward standardized, scalable, and time-efficient molecular modeling pipelines for future biomolecular and drug design applications.

large language model, machine learning, simulation, (22 more...)

2512.10034

Country:

North America > United States > California (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-11-2025

Financial Instruction Following Evaluation (FIFE)

Matlin, Glenn, Siddharth, null, JM, Anirudh, Shukla, Aditya, Hassan, Yahya, Chava, Sudheer

Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-difficulty benchmark designed to assess LM instruction-following capabilities for financial analysis tasks. FIFE comprises 88 human-authored prompts and employs a verification system with chainable, verifiable constraints for fine-grained reward signals. We evaluate 53 models (proprietary, open-weight, open-source) in a zero-shot setting. Our key findings reveal a clear performance hierarchy: the top open-weight model (76.1 strict / 79.5 loose) surpasses the leading proprietary system (65.9 strict / 70.5 loose), while the best open-source models lag significantly (45.5 strict / 48.9 loose). However, even top-performing models struggle with FIFE's complex requirements, failing to achieve perfect compliance. We release our dataset and code as an open-source resource to promote research in Reinforcement Learning for the financial domain.

fin, large language model, machine learning, (19 more...)

2512.08965

Country:

Europe > Austria > Vienna (0.14)
North America > United States (0.14)

Genre: Research Report (0.87)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

arXiv.org Artificial IntelligenceDec-9-2025

As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.

large language model, machine learning, natural language, (19 more...)

2506.12758

Country:

North America > Cuba (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Syria (0.14)
(185 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (0.67)
Government > Regional Government > Asia Government > Middle East Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Spanos, Nikolaos, Lymperaiou, Maria, Filandrianos, Giorgos, Thomas, Konstantinos, Voulodimos, Athanasios, Stamou, Giorgos

V-CECE: Visual Counterfactual Explanations via Conceptual Edits

arXiv.org Artificial IntelligenceDec-8-2025

Recent black-box counterfactual generation frameworks fail to take into account the semantic content of the proposed edits, while relying heavily on training to guide the generation process. We propose a novel, plug-and-play black-box counterfactual generation framework, which suggests step-by-step edits based on theoretical guarantees of optimal edits to produce human-level counterfactual explanations with zero training. Our framework utilizes a pre-trained image editing diffusion model, and operates without access to the internals of the classifier, leading to an explainable counterfactual generation process. Throughout our experimentation, we showcase the explanatory gap between human reasoning and neural model behavior by utilizing both Convolutional Neural Network (CNN), Vision Transformer (ViT) and Large Vision Language Model (LVLM) classifiers, substantiated through a comprehensive human evaluation.

classifier, machine learning, natural language, (20 more...)